Robust Techniques for Organizing and Retrieving Spoken Documents

نویسنده

James Allan

چکیده

Information retrieval tasks such as document retrieval and topic detection and tracking (TDT) show little degradation when applied to speech recognizer output. We claim that the robustness of the process is because of inherent redundancy in the problem: not only are words repeated, but semantically related words also provide support. We show how document and query expansion can enhance that redundancy and make document retrieval robust to speech recognition errors. We show that the same effect is true for TDT’s tracking task, but that recognizer errors are more of an issue for new event and story link detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust spoken document retrieval methods for misrecognition and out-of-vocabulary keywords

This paper describes a Japanese spoken document retrieval system that is robust for Out-of-Vocabulary (OOV) words. A standard approach to spoken document retrieval is to automatically transcribe spoken documents into word sequences, which can be directly matched against queries. In this approach, the documents including OOV words and words misrecognized as other words cannot be retrieved. To av...

متن کامل

Spoken Content-Based Audio Navigation (SCAN)

We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora that uses new information retrieval and speech processing techniques to create easily navigable presentations of documents relevant to a user query. Experiments show that the new interface is more effective than simple speechalone interfaces.

متن کامل

The MERL SpokenQuery information retrieval system a system for retrieving pertinent documents from a spoken query

This paper describes some key concepts developed and used in the design of a spoken-query based information retrieval system developed at the Mitsubishi Electric Research Labs (MERL). Innovations in the system include automatic inclusion of signature terms of documents in the recognizer’s vocabulary, the use of uncertainty vectors to represent spoken queries, and a method of indexing that accom...

متن کامل

Open-vocabulary spoken-document retrieval based on query expansion using related web documents

This paper proposes a new method for open-vocabulary spoken-document retrieval based on query expansion using related Web documents. A large vocabulary continuous speech recognition (LVCSR) system first transcribes spoken documents into word sequences, which are then segmented into semantically cohesive units (i.e., stories) using a text segmentation technique. Given a text query word, Web docu...

متن کامل

Fast latent semantic indexing of spoken documents by using self-organizing maps

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

EURASIP J. Adv. Sig. Proc.

دوره 2003 شماره

صفحات -

تاریخ انتشار 2003

Robust Techniques for Organizing and Retrieving Spoken Documents

نویسنده

چکیده

منابع مشابه

Robust spoken document retrieval methods for misrecognition and out-of-vocabulary keywords

Spoken Content-Based Audio Navigation (SCAN)

The MERL SpokenQuery information retrieval system a system for retrieving pertinent documents from a spoken query

Open-vocabulary spoken-document retrieval based on query expansion using related web documents

Fast latent semantic indexing of spoken documents by using self-organizing maps

عنوان ژورنال:

اشتراک گذاری